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Abstract 

This paper presents a computationally efficient yet powerful binary framework for robust facial representation based on 
image gradients. It is termed as structural binary gradient patterns (SBGP). To discover underlying local structures in 
the gradient domain, we compute image gradients from multiple directions and simplify them into a set of binary strings. 
The SBGP is derived from certain types of these binary strings that have meaningful local structures and are capable of 
resembling fundamental textural information. They detect micro orientational edges and possess strong orientation and 
locality capabilities, thus enabling great discrimination. The SBGP also benefits from the advantages of the gradient do¬ 
main and exhibits profound robustness against illumination variations. The binary strategy realized by pixel correlations 
in a small neighborhood substantially simplifies the computational complexity and achieves extremely efficient process¬ 
ing with only 0.00325 in Matlab for a typical face image. Furthermore, the discrimination power of the SBGP can be 
enhanced on a set of defined orientational image gradient magnitudes, further enforcing locality and orientation. Results 
of extensive experiments on various benchmark databases illustrate significant improvements of the SBGP based repre¬ 
sentations over the existing state-of-the-art local descriptors in the terms of discrimination, robustness and complexity. 
Godes for the SBGP methods will be available at http://www.eee.manchester.ac.uk/research/groups/sisp/software/. 
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1. Introduction 

Face recognition has been one of the most active topics 
in image and pattern recognition due to its much increased 
attention and applications in law enforcement, surveillance, 
human-computer interaction, etc. Although tremendous 
progresses have been made over the last two decades, it is 
still regarded as an unsolved problem in real-world situa¬ 
tions, where large within-class variations in facial appear¬ 
ance exist (e.g. illuminations, expressions and poses). A 
key solution lies in facial representation and a great deal 
of effort has been devoted to it. A desirable facial descrip¬ 
tor should be discriminative to inter-person differences but 
robust to intra-person variations, and at the same time, ef¬ 
ficient to process. 

Appearance-based methods, one of widely adopted ap¬ 
proaches, consider face images as holistic vectors of pixel 
intensities in high-dimensional space. Dimensionality re¬ 
duction and manifold techniques are typically applied to 
reduce the dimensionality and to extract intrinsic features 
[1]. Typical early examples are Eigenfaces [2] and Fish- 
erfaces [3] where linear PGA is used. They have been 
enhanced by nonlinear PGA and manifold methods 
laiT]. However, faces represented by pixel intensities are 


* Corresponding author 

Email addresses: wl.huang@siat.ac.cn (Weilin Huang), 
h.yin@manchester.ac.uk (Hujun Yin) 


sensitive to variations such as occlusions, illumination, ex¬ 
pression and pose. 

Zhang et a/. [8] have proposed a novel descriptor, termed 
as the Gradient faces, to extract illumination insensitive 
features in the image gradient domain. Faces are described 
by using image gradient orientation (IGO) instead of inten¬ 
sity to achieve strong robustness to illumination change. 
To further take advantage of gradient features, Tzimiropou- 
los et al. [9] derived a similarity measure based on cosine of 
IGO differences between images (termed as IGO cos) and 
showed that the measure considerably mitigated the ef¬ 
fect of variations and enhanced PGA based recognition. 
However, the similarity measures of the Gradient faces and 
I GO cos are based on pixel-wise correlations, and hence are 
holistic representations, which are sensitive to local defor¬ 
mations, rotations and spatial scales. That is, they are 
prone to facial variations such as expressions and poses 

nnunmii. 

Local feature descriptors have recently gained consid¬ 
erable attention due to their resilience to multiple visual 
variations by enforcing spatial locality in both pixel and 
patch levels. Two of the most successful local descrip¬ 
tors are Gabor wavelets m and local binary patterns 
(LBP) [HITS]. Gabor features extract both micro tex¬ 
ture details and global shape information from spatial and 
spatial-frequency domains and are robust to local distor¬ 
tions, leading to certain successes in face recognition m 





iniiiH]. However, Gabor representations are time-consuming 
to extract and also generate a large number of features 
with the convolution kernels, making them prohibitive for 
real-time applications. Whilst, the LBP features are sim¬ 
ple, efficient and yet resistant to illumination changes and 
they are also capable of detecting micro texture, e.g. spots, 
corners and edges [15]. However, the capability of the LBP 
descriptor can be severely affected by drastic changes of 
pixel intensity, such as extreme lighting. Most current lo¬ 
cal facial descriptors that are built on the Gabor and LBP 
also suffer from these inherent limitations [191 EQI HZl EH • 

Building on the properties gained from the IGO do¬ 
main and local binary features, this paper presents a new 
local facial descriptor, termed as the structural binary gra¬ 
dient patterns (SBGP), for facial images. It measures re¬ 
lationships between local pixels in the image gradient do¬ 
main and effectively encodes the underlying local struc¬ 
tures into a set of binary strings, not only increasing the 
discriminative power but also significantly simplifying the 
complexity. We observe that the structural patterns of 
SBGP are capable of detecting stable micro edge texture 
from various directions. Local features built on the his¬ 
togram statistics from these orientational edge textures 
contain the primary structural information of biological 
vision systems and exhibit desirable characteristics of spa¬ 
tial locality, orientation and scale selectivity. They show 
stronger orientational power than the LBP and Gabor fea¬ 
tures, leading to improved discriminative representation. 
Furthermore, an enhanced descriptor can be devised by 
building SBGP patterns on a set of orientational image 
gradient magnitudes (OIGM), termed as SBGPM, to fur¬ 
ther enforce its locality and orientation. Extensive exper¬ 
iments on several benchmark databases demonstrate the 
significant advantages of the SBGP-based methods over 
the-state-of-the-art methods with respect to discrimina¬ 
tion, robustness and complexity. 

Next, a brief review on related work is given in Section 
2. The proposed SBGP descriptor is then presented in 
Section 3. Section 4 discusses favorable properties of the 
SBGP descriptor and connections and distinctions among 
SBGP, LBP and Gabor representations. Section 5 de¬ 
scribes the enhanced SBGPM descriptor. Finally, experi¬ 
mental verifications are provided in Section 6, followed by 
conclusions in Section 7. 

2. Related work 

Local histograms built on IGO statistics have been con¬ 
sidered as visually prominent features and have favorable 
properties such as invariance against illumination. Ini, 
Zhang et al. computed Gradientfaces by using IGOs rep¬ 
resentation instead of intensities to obtain an illumination 
insensitive measure. They showed that features extracted 
from gradient domain are more discriminative and robust 
than those in the intensity domain, and are even more tol¬ 
erant to illumination variations than the methods based on 
the reflectance model [22j|23l|24]. Similarly, Tzimiropoulos 


et a/. [9] presented a simple yet robust similarity measure 
based on IGO representation and cosine of kernels of IGO 
differences between images {I GO cos)- Then PGA subspace 
is adapted in the IGO space to generate a more compact, 
discriminant and robust representation, referred to as the 
IGOPCA^. 

Recently, a number of local facial descriptors have been 
derived from the Gabor or LBP features or their combi¬ 
nations. In m, the local gabor binary pattern histogram 
sequence (LGBPHS) was proposed by first running Gabor 
filters on face images and then building LBP histogram 
features on the resulted Gabor magnitude faces. Simi¬ 
lar methods include the histogram of Gabor phase pat¬ 
terns (HGPP)[2T] and Gabor volume based LBP (GV- 
LBP) [25] . The advantages of these methods are built 
on the virtues of both Gabor and LBP descriptors. How¬ 
ever, they commonly suffer from the difficulties of Gabor 
based representations, i.e. high computational complexity 
and high dimensionality. 

As a simpler approach, Jie et al US] proposed a We¬ 
ber local descriptor (WLD) based on the Weber’s Law of 
human perception system, which states that the notice¬ 
able change of a stimulus is a constant ratio of the original 
stimulus. In m, Tan and Triggs presented local ternary 
patterns (LTP) by extending LBP to 3-valued codes for in¬ 
creasing its robustness to noise in the near-uniform image 
regions. Both methods have been shown to be highly dis¬ 
criminative and resistant to illumination changes, extend¬ 
ing the advantages of LBP. However, similar to LBP, both 
descriptors build local relationships in the intensity do¬ 
main, which can be seriously affected by dramatic changes 
of pixel intensity. 

The proposed SBGP is closely related to the center- 
symmetric local binary pattern (GS-LBP) [26] which com¬ 
putes local binary from symmetric neighboring pixels. How¬ 
ever, the SBGP differs distinctly in three aspects. First, 
structural patterns and multiple spatial resolutions are de¬ 
fined in the SBGP. We show some theoretical insights that 
the structural patterns of SBGP work as oriented edge 
detectors, a key to discriminative and compact represen¬ 
tation. The multiple spatial resolution strategy increases 
descriptor’s flexibility with stronger discriminative power. 
Second, motivated by the multiple channels strategy of 
the invariant descriptors such as the SIFT [27] and POEM 
m , we facilitate the SBGP descriptor on a set of orienta¬ 
tional image gradient magnitudes. This further enhances 
its discriminative power. Finally, the GS-LBP was orig¬ 
inally developed for image matching, while the SBGP is 
proposed for face recognition. The task of face recognition 
often requires more detailed and robust local features than 
general features for matching. As it we will be shown in 
Section 6.2.1 that the GS-LBP is highly sensitive to signif¬ 
icant illuminations. 
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Figure 1: Performance of Gradient faces, IGOcos, LBP and HIGO against (a) scream, (b) occlusion, (c) and (d) illuminations. Block numbers 
for LBP and HIGO are the same, 20 for AR and 24 for YaleB, where both methods reached stable performance. 


3. Structural Binary Gradient Patterns 

This section begins with a discussion of histogram statis¬ 
tics of IGO, followed by introduction of the proposed SBGP 
descriptor. SBGP computes image gradients from multi¬ 
ple directions in order to extract a set of binary numbers 
for describing local structures. 

With the excellent properties of the IGO representa¬ 
tion, we naturally consider to extract robust facial fea¬ 
tures from the IGO domain, and at the same time, to 
improve its robustness against local distortions by enforc¬ 
ing block-level locality. A straightforward approach is to 
directly compute histogram statistics of the IGO represen¬ 
tation (HIGO) by dividing a face image into a number of 
non-over lapped blocks. Each block is represented by an 
IGO histogram, whose bin number is determined by the 
segmentations between [0,27r). 

To illustrate the advantage of HIGO, two simple ex¬ 
periments were conducted on two databases. On the AR 
database, 100 subjects with the group of natural faces 
were used as gallery images, and two groups of faces with 
scream expressions and scarf occlusion (both cause large- 
scale local distortions) were presented as probe images. 
Each group included 100 images from the first session. 
On the YaleB database, a subset of 10 subjects was used. 
The faces with the most natural light sources were used 
as galleries and two sets with medium and high illumina¬ 
tion conditions (corresponding to sets 4 and 5 in [28]) were 
presented as the probes. We evaluated the IGO based 
representations, such as Gradientfaces[8] and IGO cos jS], 
local histogram methods, e.g. LBP [29] and HIGO (with re¬ 
spect to bin numbers). The results are illustrated in Eig.[^ 
(systematical evaluations are reported in Section 6). The 
experiments here aim to show the experimental cue that 
motivated the derivation of the proposed descriptor. Eig.[^ 
evidently provides two observations. Eirst, local features 
seem more robust to local deformations in expression and 
occlusion. Second, IGO methods are more capable than 
LBP in dealing with illumination changes. HIGO, taking 
advantages of both, achieves stronger robustness to these 
effects. 

Our goal was to develop a descriptor that can effec¬ 
tively integrate the advantages of both approaches, while 
still being computationally efficient. Eor this considera¬ 


tion, one has to trade off between complexity and discrim¬ 
ination with acceptable loss of information. Indeed, as can 
be seen that HIGO often yields reasonable performance by 
using fewer bins (e.g. four). Accordingly, it seems appro¬ 
priate to build a feature model based on the four-bin IGO 
histograms. Some insights are discussed next. 


3.1. Theoretical Analysis of Four-Bin HIGO 

IGO is computed by a four-quadrant inverse tangent, 
which can be formulated as. 
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arctan 2 
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( 1 ) 


where sign{G'^ y) and sign{G% y) return the signs of the 
gradients in the vertical and horizontal directions. \G'^^y\ 
and \G%^y\ are the gradient contrasts. In a four-bin his¬ 
togram, each bin accounts the number of pixels whose IGO 
values located in one of four quadrants, e.g. 0x,y G [0,7r/2). 
Hence IGO values are quantified into four discrete values 
as {0,7r/2,7r,37r/2}, by discarding gradient contrast. 
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In a four-bin HIGO, pattern labels can be directly com¬ 
puted by four different combinations of two gradient signs 
as [+ -h], [+ -], [- +] and [- -], which are naturally applica¬ 
ble to binary strategy. Similar to LBP[I5], the signs of the 
gradients are not affected by the changes of mean inten¬ 
sities, yielding a distinct ability to resist gray-scale varia¬ 
tions. Subsequently, we generate two binary numbers to 
describe the patterns of four-bin HIGO as, 11, 10, 01 and 
00. While LBP discards intensity contrast, two-bit HIGO 
discards gradient contrast to achieve illumination invari¬ 
ance as well as computational efficiency. To this end, we 
have derived the basic local binary features from the IGO 
domain, which serve as the basis of the proposed SBGP 
descriptor. 


3.2. Binary Image Gradients from Multiple Directions 

The traditional IGO is computed on gradients of hor¬ 
izontal and vertical directions. Its pixel-level locality is 
realized by using only four neighbors in two orthogonal 
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Figure 2: Basic SBGP operator: (a) eight neighbors of a central 
pixel (115), (b) four correlation directions: Gl, G2, G3 and G4, (c) 
principal (in red or bold) and associated (in black or plain) binary 
numbers, resulting string, OIII 2 , or label, L = 07. 


directions. Most current high-performing local descrip¬ 
tors extract meaningful local information from at least 
eight neighbors and their discriminative power can be im¬ 
proved by suitably increasing the number of neighbors 
[HI na E [m i25i i2oi [30]. Similarly, it can be expected 
that greater discrimination can be achieved in the gradient 
domain by involving more local neighbors from multiple 
directions. 

Following this intuition, we further extend the four- 
bin HIGO to multiple directions, resulting the proposed 
the new facial descriptor, binary gradient patterns (BGP). 
Specifically, the BGP computes binary correlations be¬ 
tween symmetric neighbors of a central pixel from mul¬ 
tiple {k) directions. The number of neighbors is twice of 
the number of directions. The computation is simple. A 
basic BGP operator of four directions is presented in Fig.[^ 
and detailed bellow: 

1) . A set of local neighbors of a central pixel are first 
given (e.g. eight neighbors in Fig. |^a)). 

2) . Then, a pair of binary numbers including a princi¬ 
pal binary number ) and an associated binary number 
(B^), are computed by correlating two symmetric neigh¬ 
bors in each direction based on Eq. (§, and totally eight 
binary numbers are devised from four directions: Gl, G2, 
G3 and G4, shown in Fig. |^b) and (c); 




if Gl - G“ ^ 0 
if G+ -G- <0 


B-=l-B+ i = l,2,...,k 


(3) 


where Gf and G^ are the intensity values of the pixels 
corresponding to locations in Fig. [^b). 

3). Finally, label of the central pixel is computed from 
the resulting four principal binary numbers. 


L = J2T-^Bt (4) 

Although eight binary numbers are obtained in four 
directions, the principal and associated binary numbers in 
each direction are always complementary. Hence, there 
are only two variances in each direction, which only re¬ 
quire a single binary number/bit to describe. For a com¬ 
pact representation, only the principal binary numbers are 


required for computing the labels by Eq. 0- capable of 
describing all possible variances of the BGP patterns. The 
number of BGP labels (Nl) is determined by the number 
of the principal binary numbers/bits, equal to the number 
of directions (/c), Nl = 2^. Thus the possible labels of a 
/c-directional SBGP operator are G {0, 1 , 2, ... , 2^“^}. 
Note that this number (2^) is substantially smaller than 
the number of EBP labels (2^^). Eor example, in a typ¬ 
ical model with sixteen neighbors, the numbers of labels 
for BGP and LBP are 256 and 65536, respectively. The 
proposed BGP operator efficiently integrates the merits 
of IGO features and local histogram representations with 
extremely low computational cost. 

3.3. Structural BGP 

There are sixteen different labels from a four-directional 
BGP descriptor. The binary structures of these labels, 
ranging from 0 to 15, are shown in Eig. [^a). As can 
be seen, each label is constructed by eight binary num¬ 
bers/bits, including four bits of ”1” and four bits of ”0”. 
The principal bits are presented in red or bold. It is in¬ 
teresting to investigate the distributions of ”l”s and ”0”s 
in different labels. It can be seen that certain labels have 
meaningful structures where four bits of ” 1” are located 
consecutively. There are eight labels having continuous 
bits of ” 1” (marked as red or bold-lined boxes in Eig. ia)); 
while the ”l”s in the other eight labels are discontinuous 
(marked as black or thin-lined boxes). These continuous 
”l”s indicate more stable local changes in texture and es¬ 
sentially describe the orientations of ’’edge” texture. An 
observation is that statistics on these patterns is highly 
stable and meaningful to characterize local structures. By 
contrast, labels with discontinuous ”I”s include arbitrary 
changes of local texture, likely to indicate noise or out¬ 
liers. Eurthermore, from experimental statistics, patterns 
having the continuous ”I”s often take up a vast majority 
in a typical BGP face, e.g. about 95 percent. The statis¬ 
tics of BGP patterns of various labels on 2600 face images 
from the AR databases is presented in Eig. [^a). 

Based on these observations, we define the patterns 
having continuous ”I”s as structural binary gradient pat¬ 
terns (SBGP), while refer the others as non-structural pat¬ 
terns. This yields in total eight different labels for the 
structural patterns (as listed in Eig. [^b)) while discard¬ 
ing all non-structural ones. Therefore, only eight bins are 
needed in the SBGP histogram. This is an appealing prop¬ 
erty, not only helping to rule out noise and outliers in face 
images, but also further reducing feature dimensions. Eor 
example, even with 24 neighbors, the bin number of the 
SBGP histogram is only 24, compared the 2^^ = 4096 bins 
of the CS-LBP histogram [26]. Eurther discussions and 
evaluations are presented in the next section. 

3.4‘ Spatial Resolutions 

The basic SBGP descriptor (Eig. is computed from 
four directions (/c = 4) in a square neighborhood of side 
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Figure 3: Definition of BGP structural and non-structural patterns. 
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Figure 4: Histogram statistics of BGP and LBP patterns on AR 
database, (a) BGP patterns, ar-axis corresponds to 16 different la¬ 
bels presented in Fig.j^ (b) patterns, labels 0-8 and 9 are 

uniform and non-unijorm patterns, respectively. 


length of two units. Similar to LBP based descriptors, the 
capability of SBGP can be further improved by increas¬ 
ing the number of gradient directions and by enlarging the 
neighborhood. To this end, we define the spatial resolution 
of SBGP by the number of neigbors/directions and radius 
of the square, indicated as (P, i?). Typically, the max¬ 
imum number of neighbors is eight times of the radius, 
Pmax = SR, e.g. (8,1), (16,2) and (24,3). The SBGP 
descriptor with structural patterns in spatial resolution of 
(P, R) is referred as SBGPp^r. Assuming that the number 
of neighbors is maximized with respect to the radius, we 
present a generalized algorithm for computing the SBGP 
operator from a given pixel in location {i,j), with spatial 
resolution of {P,R), see Algorithm for details. 

Algorithm returns the label values of all pixels. In 
this framework, features are built on histograms of the 
SBGP structural patterns. One needs to know the number 
of the structural labels and their values. This information 
is independent to face images, and is only determined by 
the given spatial resolution. From Fig. [^a), we can find 
that four continuous ”l”s in eight structural labels run 
through all locations of eight neighbors, indicating that the 
number of structural labels {Ngp) is equal to the number 
of neighbors, Ngp = P, compared to 2^ of the LBP and 
2? of the GS-LBP [26]. Also, based on the distributions of 
principal bits of the structural labels (Fig.j^b)), we device 
Algorithm for computing structural labels at resolution 


of (P,P). 


Algorithm 1 Gomputing SBGP descriptor 


Require: Location of a given pixel (i, j), spatial resolution, 
(P, P) and pixel intensity. 

Ensure: Label of SBGP descriptor, 

1: step one: compute principal binary numbers in k 

directions, k = P/2. 

2: t = 1 

3: for m = —R R do 
4: 

D+ _ ^ii+niJ+R) ~ I{i-niJ-R) ^ 0 

^ |0 if I(^i^rii,j-\-R) — I{i-ni,j-R) < 0 


5 

6 

7 

8 


t — t A 1 

end for 

for 712 = -{R - 1) ^ (P - 1) do 

_ ff I(i-\-R,j-n2) ~ ^(i-R,j-\-n2) ^ ^ 

^0 if ~ I(i — R,j-\-n 2 ) ^ 0 


(6) 


9: t — t A 1 

10: end for 

11: step two: compute by Eq. Q. 

12: return Label of pattern, . 


Algorithm 2 Gomputing structural label 

Require: Number of neighbors, P. 

Ensure: Labels of structural patterns, 

1: Number of directions, k = P/2. 

2: for t = 1 ^ P do 
3: if t < k then 

4: = 2'-i - 1 

5: else 

6 : = 

7: end if 

8: end for 
9: return {L^^}fLi. 


4. Analysis, Discussions and Comparisons 

This section discusses favorable characteristics of the 
SBGP descriptor and systematically compares it with two 
fundamental descriptors, LBP and Gabor wavelets, in terms 
of discrimination, robustness and complexity. Theoretical 
insights can be gained by discussing the underlying con¬ 
nections and distinctions among these descriptors, along 
with experimental studies. 

Both SBGP and LBP employ advantageous binary strat¬ 
egy for extracting pixel correlations in local neighborhoods. 
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Figure 5: Demonstrations of Gabor, LBP and SBGP faces. Top: original AR face and Gabor magnitude faces of eight orientations and a 
fixed scale. Middle: face and location maps of eight uniform and one nonuniform patterns. Bottom: SBGPg^i face and location 

maps of eight structural and one nonstructural patterns, corresponding to the labels in Fig. 



Figure 6: Gonnections between LBP uniform patterns and SBGP 
structural patterns. One of LBP t/P04 can be transformed to SBGP 
SP07, and one of LBP UP03 or UP05 can be transformed to SBGP 
SP07 or SP15. But different structures of LBP [/P04 may relate 
to different type of SBGP structural patterns. Other types of LBP 
uniform patterns are not guaranteed to be transformed to SBGP 
structural patterns. 

However, SBGP differs front LBP in computing binary cor¬ 
relations, leading to distinctive properties between them. 

4 . 1 . Discrimination 

The proposed SBGP descriptor is essentially an orien¬ 
tated edge detector with stronger orientational and dis¬ 
criminative capabilities than the LBP and Gabor repre¬ 
sentations. Fig. [^illustrates the outputs of the SBGPs^i 
and descriptors on a typical face image (of the 

AR database). It is evident that LBP detects various local 
textural features such as spots, corners and edges, while 
SBGP extracts orientated edge features. The SBGPg^i 
face assembles more facial information than LBPp^^ face. 
Location maps of the SBGP structural patterns are more 
informative and discriminative than those of LBP uniform 
patterns. The histogram statistics of the two descriptors 
in Fig. show that distributions of SBGP structural pat¬ 
terns are fairly even, while distributions of LBP uniform 


patterns mainly peaks at few patterns (UP04, UP05 and 
UP03). This means that all SBGP structural patterns con¬ 
tribute evenly, while LBP representation is dominated by 
few patterns. 

As stated in m, these three types of LBP patterns de¬ 
tect edge information from textural image and lead to the 
finding that edge information dominates local textural fea¬ 
tures of face images. In fact, the local structures described 
by UP03 or UP04 or UP05 of the LBP are guaranteed to 
be represented by one of the SBGP structural labels, as 
shown in Fig. [^ determined by its orientation. In other 
words, LBP fuses all directions of the patterns into a sin¬ 
gle label, while structural SBGP separately counts differ¬ 
ent oriented edges in eight labels to increase discriminative 
power. Although HOG also computes histogram from mul¬ 
tiple orientations, the gradient orientations used by HOG 
are only computed from four local neighbors, which can¬ 
not effectively detect edge information and are insufficient 
to realize meaningfully the local structures. 

Furthermore, we argue that the SBGP processes some 
essential properties of the human vision system, which is 
characterized by spatial locality, orientation and scale se¬ 
lectivity m, and responds strongly to specifically oriented 
lines or edges positioned in their receptive fields |32] . Ga¬ 
bor wavelets are a well-known model for describing these 
properties and have had considerable successes in image 
feature representation [131 [16]. The inherent characteristics 
of the SBGP can resemble these properties by enforcing its 
locality in both pixel and block levels, by taking the statis¬ 
tics of edge orientations to improve orientational capabil¬ 
ity, and by defining a tunable spatial resolution for scale 
selection. Fig. [^provides such an intuitive view that SBGP 
faces preserve stronger orientations than Gabor faces. 
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Figure 7: Histogram statistics of pixel intensity, LBP and SBGP 
patterns on the two blocks (15 X 15) of two faces in (a) (same identity) 
with significantly different illuminations. 

4 . 2 . Robustness 

LBP achieves gray-scale invariance by discarding in¬ 
tensity contrast. SBGP employs a heuristic from the IGO 
representation to further take advantage of the gradient 
domain where local representation is inherently invariant 
against illumination changes. The illumination invariance 
of IGO based representations has been verified in the re¬ 
flectance model by canceling out illumination functions of 
different directions when computing the ratio of gradients [8] 
Benefiting from this merit, SBGP achieves stronger ro¬ 
bustness to illumination variation than LBP by further 
discarding gradient contrast (as discussed in Section 2.2). 
Fig.Q shows the histogram statistics of LBP and SBGP 
patterns on two exemplar face blocks, illustrating the im¬ 
proved robustness of SBGP against extreme illumination 
condition. 

Furthermore, the SBGP discards non-structural pat¬ 
terns, which contain non-smooth or discontinuous changes 
of local pixels. These patterns are often caused by noise 
or outliers, and contain little structural and meaningful 
information. The experimental statistics on 5032 face im¬ 
ages from the AR and YaleB databases show that there is 
a very low proportion of these non-structural patterns in 
general face images, only 5.8%, lower than the proportion 
of LBP nonuniform patterns, at about 8.5%. In addition, 
some LBP uniform labels have small numbers of patterns. 
For example, the UPOO and UP08, as shown in Fig. [^b) 
and Fig. 1^ (middle row), detect bright and dark spots, re¬ 
spectively, in textural images m- These types of patterns 
may also include some irregular appearances, such as noisy 
spots and corrupted pixels. 

As shown in Fig. [^(b), the SBGP non-structural pat¬ 
terns contain a large amount of noise. The SBGP discards 
them to mitigate the effect of noise. While the LBP retains 
all of its nonuniform patterns and assign additional labels 
to them. Subsequently the numbers of LBP non-uniform^ 
spot and corner patterns increase dramatically when noise 
is present, as shown in Fig. |8](c)-(e). 

4 . 3 . Complexity 

Complexity often refers to computational speed and 
storage demand. For SBGP and LBP, the computational 


Figure 8: Gaussian noise on the SBGP non-structural and LBP non- 
uniform patterns, (a) Original face and with adding Gaussian noise; 
(b) the SBGP non-structural and (c) LBP non-uniform patterns; (d) 
the LBP UPOO and UP08 patterns(spots); and (e) the LEB UPOl 
and UP07 patterns (corners). 

speed depends on the number of binary correlations and 
the number of resulting (principal) binary numbers (la¬ 
bels), both of which are determined by the number of 
neighbors/directions. The computation of Gabor features 
depends on the number of convolutions, applying multiple 
Gabor kernels with various scales and orientations for a set 
of local pixels. So, the speed is determined by the numbers 
and sizes of Gabor kernels. The storage demands of these 
three descriptors are measured by the feature dimensions. 
As mentioned, the dimensions of local histogram based fea¬ 
tures are computed by multiplying the numbers of labels 
(bins) and blocks. 

In this comparison, the numbers of neighbors were set 
to their maximum numbers with respect to radii for both 
binary descriptors, P = 8R, e.g. (8,1), (16, 2) and (24, 3). 
The Gabor faces were run by using their typical parameter 
setting with kernel size of 31 x 31, eight different orienta¬ 
tions and five various scales [16] . We computed the average 
running time per face and the feature dimensions, together 
with the numbers of computational units for a given pixel 
and the numbers of labels generated by binary descrip¬ 
tors. The experimental results on the AR and YaleB face 
databases (5053 faces in total) are given in Table The 
image size was 100 x 100 and the numbers of blocks used 
by LBP and SBGP were the same as, 36. The experiments 
were run on a typical PC with AMD Dual Core processor 
of 2.2GHz and RAM of 2.0GB. SBGP was run by our un¬ 
optimized MATLAB code. LBP code was from the authors 
of [T5j [29] (also in MATLAB). Gabor representation was 
run based on the MATLAB codes of |33l [16] , which inte¬ 
grates the C/C++ codes for computing the convolutions. 

As can be seen, the complexities of the two binary de¬ 
scriptors are significantly lower than that of Gabor. The 
ratios of computational cost, execution time and final fea¬ 
ture dimension between Gabor feature and basic SBGP 
descriptor are about 4800:1, 300:1 and 1400:1, respectively. 
Compared to LBP, the costs and execution times of SBGP 
are about half of that of LBP in all resolutions. Run¬ 
ning a basic SBGP operator on a regular face image takes 


7 


















OIGM Faces 


Table 1: Complexity of SBGP, LBP and Gabor features. 


Descriptors 

jj comp, units time(s) J labels | J dimensions 



Gabor 

38440 1 0.9699 | - | 400000 


(P,i?) = (24, 3) 

~TbF^ 

48 

0.0189 

555 

19980 


48 

0.0188 

26 

936 

SBGP 

24 

0.0097 

24 

864 


(P,P) = (16,2) 

~lbF^ 

32 

0.0128 

243 

8748 

LBPriu'^ 

32 

0.0124 

18 

648 

SBGP 

16 

0.0065 

16 

576 


(P,P) = (8,1) 

~lbF^ 

16 

0.0056 

59 

2124 

LBPriu'^ 

16 

0.0055 

10 

360 

SBGP 

8 

0.0032 

8 

288 
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Figure 9: Framework of SBGPM descriptor. 


only 0.00325, making it applicable to real-time applica¬ 
tions. Furthermore, SBGP uses even fewer pattern labels 
than the LBP descriptor, i.e. much lower dimensionality 
of its features. For example, the dimensions of SBGP fea¬ 
tures are only about 13.6%, 6.6% and 4.3% of LBP^^ in 
three spatial resolutions. Gomparing to the GS-LBP [26] . 
in the (24, 3) case, the number of the GS-LBP dimension 
is increased to 2^^ x 36 ~ 1.5 x 10^, which is more than 
170 times of our SBGP. Hence, the SBGP descriptor is 
extremely efficient and compact. 

5. SBGP on Orientational Image Gradient Magni¬ 
tude 

Various extensions of Gabor and LBP representations 
have helped to yield some state-of-the-art local facial rep¬ 
resentations, for example by combining both properties 
and enforcing spatial locality, orientation and robustness, 
e.g. [nuniiisiiioiiiniiii]. Improving on these methods, 
we propose a framework by applying the SBGP descriptor 
on orientational IGM (OIGM), abbreviated as SBGPM, to 
enhance the discriminative power by further enforcing spa¬ 
tial locality and orientation. The framework of SBGPM 
is depicted in Fig. [^and its details given in the following 
steps: 

1) . Gompute IGO and IGM from a given face image. 

2) . Generate a set of OIGM images from IGO and 
IGM. Similar to the orientations computed by WLD [19], 
IGO is first quantized into a number of dominant orienta¬ 
tions. Then, an OIGM image, corresponding to a certain 
dominant orientation, is generated by computing the aver¬ 
age values of IGMs of this dominant orientation in defined 
neighborhoods (e.g. with sizes of 7 x 7, referred as local 
resolution of OIGM), 

M^ = l W Mfe i = l,2,...,s (7) 

n 

where (i,j) is the location index of the given pixel, Mk 
is the IGM value in location k, representing the 2D index 
such as (i,j). j is a set of location indices corresponding 


to the t-th dominant orientation in the defined neighbor¬ 
hood of the given pixel, n is the number of pixels in this 
neighborhood, e.g. n = 7 x 7 = 49. 

3). Run SBGP on the OIGM images to yield a set of 
SBGPM images, on which local histogram is computed to 
generate the final feature vector. 

In the SBGPM framework, the strength of edge in¬ 
formation is enforced by using IGM image instead of the 
intensity image. It produces stronger orientational power 
by generating the OIGM images from different discrete 
dominant orientations, and further enforce spatial local¬ 
ity by computing the average IGM values in a certain lo¬ 
cal resolution. Effectively, SBGPM gains greater discrim¬ 
inant ability from these enhancements, while allowing an 
acceptable increase in complexity. Fortunately, SBGPM 
often achieves high performance in generally low complex¬ 
ity, different from the Gabor representation that would 
require a large number of Gabor faces (e.g. typically 40) 
and a large convolution neighborhood (e.g. 31 x 31). From 
our experiments, the typical number of OIGM and its local 
resolution are 3 and 7x7, respectively, leading to only 147 
additional computational units (less than 0.4% of Gabor 
faces) and 3 times of dimensions (compared to 40 times of 
Gabor based fusion models). 

6. Robust Face Recognition 

We systematically evaluated the performance of SBGP 
based descriptors for facial representation and their ro¬ 
bustness against multiple variations such as changes of 
lighting, expression, occlusion and age. Two groups of ex¬ 
periments were conducted. First, the performance of the 
SBGP was compared to the basic LBP and Gabor features, 
together with discussions on parameter selections. Second, 
the capability of the SBGP and SBGPM descriptors was 
further evaluated by comparing with recent methods on 
four publicly available databases: the AR [34l [35], (ex¬ 
tended) YaleB [28l|36], FERET[37] and Labeled Eaces in 
the Wild (LEW) [38] databases. Eor unbiased compar¬ 
isons and reliable results, all implemented methods op- 
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Figure 10: Face examples: (a)-(e) group 1-5 of the YaleB faces; (f)- 
(j) AR faces with Nature, Expression, Lighting Sunglass & Lighting 
and Scarf & Lighting. 


erated directly on the raw face images without any pre¬ 
processing, such as DoG filtering, Gamma correction and 
lighting equalization, some of which may prominently af¬ 
fect the experimental results. 

1). The YaleB database contains about 22000 face 
images of 38 subjects with 9 different poses and 64 il¬ 
lumination conditions for each subject. A widely used 
subset [28j |36], which includes all faces from the frontal 
pose (64 X 38 = 2432), was exploited in the experiments 
for testing the robustness to illumination variations. The 
dataset was divided into five different groups with increas¬ 
ing effect of illumination, according to [36] . Exemplar faces 


are shown in Fig. 10 


2) . The AR database consists of over 4000 images of 
126 subjects, each having 26 facial images taken in two dif¬ 
ferent sessions separated by two weeks. Each session has 13 
images with multiple variations in expression, lighting and 
occlusion (sun glasses and/or scarf). A subset of cropped 
faces (by its original authors [35]) of 50 male and 50 female 
subjects was used in the experiments. Examples on these 
variations are shown in Fig. 

3) . The FERET database[37] has five subsets, includ¬ 
ing a gallery set (Fa) and four probe sets (Fb, Fc, Dupl 
and DupII). The gallery set contains 1196 frontal images 
of 1196 subjects. The Dupl and Dup II sets, including 
722 and 234 face images respectively, have been proven 
extremely challenging due to significant appearance varia¬ 
tions caused by aging. Our experiments were conducted on 
both challenge sets. Following most existing methods, we 
cropped the original images into smaller sizes (140 x 120) 
according to the available eye’s coordinates, but without 
any further pre-processing. 

4) . The LWF dataset [38] contains 13233 natural face 
images of 5749 people, collected in unconstrained environ¬ 
ments from the web. They have large real-world varia¬ 
tions in expression, lighting, pose, age, gender and even 
image scale and quality. The evaluation followed the stan¬ 
dard image-restricted test models verifying whether a pair 
of faces are from a same person. Our methods were eval¬ 
uated on the widely used View 2 set, containing ten non¬ 
overlapping subsets, each having 600 pairs of images (300 


matching and 300 non-matching pairs). Following the pre¬ 
vious work[39l [TT] . we simply cropped out the main face 
area of size 150 x 80 from the images provided by Wolf et 
a/[4Q]. 

6 .1. SBGP, LBP and Gabor Representation 

We investigated the performance of three fundamen¬ 
tal descriptors for face recognition. The proposed SBGP 
and the rotational invariant uniform LBP m were im¬ 
plemented in local histogram model as [29]. There are 
only two parameters for both methods, spatial resolution, 
(P, R), and number of blocks, x The nearest 

neighbor (NN) classifier was used to assign the label of 
the most similar gallery image to the probe face. Similar¬ 
ities between feature vectors were computed by histogram 
intersection for LBP and SBGP, and by Euclidean distance 
for Gabor representation [T6] . 

Their performances were evaluated on the YaleB and 
the AR databases. On the YaleB, a single face with nat¬ 
ural illumination condition (” A-j-OOOE+OO”) per subject 
was used as gallery image. All five groups with different 
levels of illumination effects were tested. For the AR, the 
gallery images were the natural faces from the session one 
(also a single face per subject), and the probe images were 
grouped as expression, lighting, sunglass & lighting and 
scarf & lighting, each including 600 images. The results 
are presented in Fig. 

The recognition rate of all three methods reached 100% 
in groups one and two of the YaleB database. Fig. [IT] (top 
row) shows that, LBP and Gabor descriptors had simi¬ 
lar performances, which were reasonable in group three 
with medium level of illumination effects but deteriorated 
drastically with increased illumination effects in groups 
four and five. In contrast, SBGP were consistently excel¬ 
lent (with recognition rate above 90%) even with the ex¬ 
treme lighting conditions (group five). The improvements 
of SBGP in groups four and five were highly significant, 
outperforming LBP or Gabor by more than 30% and 70%, 
respectively. This demonstrates its enhanced robustness 
against illumination variation by learning local textural 
structures from the image gradient domain, which inher¬ 
ently contain gray-scale invariant features. 

Similarly, the performance of SBGP was the best in all 
test groups on the AR database (shown in Fig. (bot¬ 
tom row)). It gave more than 90% recognition rates in 
the groups of expression, lighting, and sunglass & lighting, 
and above 80% for the groups seriously affected by both 
large-scale occlusions and illuminations. Again, the per¬ 
formances of LBP and Gabor were substantially affected 
by multiple variations in the last two groups. The excel¬ 
lent results of SBGP show that it is highly discriminative 
and robust to multiple facial variations. 

It has also been found that the SBGP descriptor is 
fairly insensitive to the choice of its parameters. First, 
the overall performance of SBGP is stable in different spa¬ 
tial resolutions, especially for (16,2) and (24,3). By con¬ 
trast, changes in spatial resolution cause large differences 
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Figure 11: Performance of SBGP, LBP and Gabor on YaleB (top) and AR (bottom). 


in recognition rate of the LBP. Second, the performance of 
both descriptors can be improved by increasing the num¬ 
ber of blocks. As can be seen, in most test groups, the 
recognition rates of SBGP become stable when the num¬ 
ber of blocks is equal or greater than 12 x 12, except for 
groups four and five of the YaleB, which require larger 
numbers of blocks to alleviate the effect of severe illumi¬ 
nation conditions. Therefore, by trading off performance 
and computational complexity, the spatial resolution of the 
SBGP was set to (16,2) in all our experiments, while the 
numbers of blocks were determined by the sizes of images. 

6.2. Lighting and Multiple Variations 

The efficiency of the proposed SBGP based descrip¬ 
tors was further evaluated by comparing with recent meth¬ 
ods, including IGO based methods (GradientfacesjH] and 
IGOPGA[9]), local feature methods (GS-LBP [26], WLD[T^ 
ST], LTP[20], POEMpT] and Volterrafaces [30] ). and fusion 
of both (LGOBP [42] and PHOG[l3]), along with recent 
results directly quoted from related literature. 

For SBGPM, the number of OIGM and local resolution 
were optimally set to 3 and 7 x 7 in all experiments. For a 
fair and unbiased comparison, all implemented methods 
employed their optimal parameters and similarity mea¬ 
sures suggested by the original authors. IGOPGA verified 
the number of reduced dimensions from 10 to its maximum 
number. The implementation of WLD was suggested by 
[41], with the number of quantized orientations set to 8 
and differential excitation value varied among {32,48,64}. 
The GS-LBP was implemented with 8 neighbors with ra¬ 
dius of 2 and 0.01 as binary threshold, as suggested in 
[26] . The number of bins for local IGO based methods 
(PHOG and LGOBP) was verified among [4,40]. All lo¬ 
cal histogram methods (SBGP, SBGPM, GS-LBP, WLD, 


LTP, POEM, and LGOBP) were run by varying the num¬ 
bers of blocks from 8 x 8 to 36 x 36. The pyramid level for 
the PHOG was optimized from 1 to 5. Finally, the best 
performance of each method, computed on three widely- 
used similarity measures: histogram intersection, x^[20] 
and Euclidean distance, was reported. The SBGP meth¬ 
ods used histogram intersection. 

6.2.1. Illumination Variation 

This experiment evaluated the illumination invariant 
property of SBGP-based methods by exploiting the same 
gallery and probe images as in the previous experiment 
on the YaleB database. To provide more comprehensive 
results, the comparisons also included a group of meth¬ 
ods based on the reflectance model specially developed to 
address the illumination effect. These methods include 
the logarithm total variation (LTV) model [24], logarith¬ 
mic wavelet transform (LWT)[44], applying Multi-linear 
Principal Gomponent Analysis on tensors-GT histograms 
(TGT-MPGA)[23] and the reconstruction with normalized 
large- and small-scale feature images (RLS)[22]. The re¬ 
sults are presented in Table 

As one can see, IGO based methods yielded better 
overall performance than the intensity based methods (lo¬ 
cal features and reflectance models). As expected, SBGP 
based methods achieved the best performance in all groups. 
Even the basic SBGP outperformed all other methods, and 
the SBGPM had the lowest average error rate at only 2.1%, 
which is about one tenth of the errors of other methods. 
Large improvements were in the severe illumination con¬ 
ditions, groups four and five, in which only 1.9% and 2.7% 
errors occurred, respectively. The best performances of 
reflectance models and IGO methods were about 12% in 
group four and around 15% for group five. The improve¬ 
ments are statistically significant, showing the exceptional 
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Table 2: Performance of single training sample per person on YaleB 
da tabase. _ 


Method 

Error Rate (%) 

Group 3 

Group 4 

Group 5 

Avg. 

LTV 124J 

21.5 

24.2 

17.6 

20.7 

LWT 1441 

18.0 

18.0 

29.2 

22.7 

t(;t-mp(;ai23i 

5.3 

39.9 


23.9 

RT.SI22I 

14.0 

14.7 

15.2 

14.7 

LGOBP 

13.4 

48.3 

67.9 

47.3 

WLD 

1.1 

15.3 

60.5 

30.6 

LTP 

2.4 

16.2 

39.5 

22.4 

CS-LBP 

4.4 

39.9 

85.7 

49.3 

PHOG 

6.4 

54.2 

77.9 

51.5 

POEM 

4.8 

11.3 

40.4 

21.9 

Volterrafaces 

6.6 

32.3 

17.4 

19.3 

Gradientfaces 

8.4 

12.6 

17.2 

13.4 

IGOPCA 

10.6 

12.0 

28.2 

18.5 

SBGP 

2.8 

9.2 

12.9 

9.1 

SBGPM 

1.3 

1.9 

2.7 

2.1 


robustness of SBGPM against illumination. 

Furthermore, WLD and LTP, extended from LBP, yielded 
low error rates in the low and medium levels of illumina¬ 
tion change, groups three and four. But their errors in¬ 
crease drastically in the extreme illumination conditions 
of group five. Note that the CS-LBP is not robust to both 
medium and extreme illumination conditions. This may be 
dueo to partly by its noise patterns, and partly by thresh¬ 
ing the intensity differences, which was originally devel¬ 
oped to achieve the robustness on flat image regions [26] . 
By contrast, the proposed SBGP methods consistently ex¬ 
celled in extreme illumination conditions. These results, 
along with the previous experiments, further verify that 
gray-scale invariance achieved in the gradient domain by 
discarding gradient contrast is stronger than that realized 
by discarding intensity contrast. 

6.2.2. Multiple Variations 

The robustness against multiple variations were ana¬ 
lyzed on the AR database. The experiments were divided 
into two groups of different training schemes: a single 
training sample per person and multiple training samples 
per person. 

EXP I: A Single Training Sample Per Person 

We used a neutral face per subject (N) in the first ses¬ 
sion as the gallery image and tested all other faces, 4 re¬ 
maining groups in session one (E, L, GL and SL) and 5 
groups in session two (N, E, L, GL and SL). Results are 
presented in Table. Recently published results achieved 
by the same experimental scheme, such as DMMA[45] and 
ESRG-Gabor [46], are also included for comparison. These 
two methods build learning models on the local features. 

By contrast, the local feature methods outperformed 
IGO based holistic representations. Again, SBGP methods 
had the best overall performance in all implementations. 
SBGPM had the lowest error rates in all tests and the 
average error rates were less than 1% and 10% for sessions 


Table 3: Performance of single training sample per person on AR 
da tabase. _ 


Method 

Error Rate (%) 

N 

1 g 


1 el 


Ave. 


session one 

UP HZ] 

- 

18.0 

- 

- 

- 

- 

DMMAI451 

- 

13.0 

- 

- 

- 

- 

RSR(;-(R46F~ 

- 

5.8 

0.0 

7.1 

5.0 

LGOBP 

- 

12.3 

8.7 

10.7 

65.0 

24.2 

WLD 

- 

3.0 

6.0 

3.3 

7.3 

5.4 

LTP 

- 

2.7 

1.3 

2.0 

9.7 

3.9 

CS-LBP 

- 

4.0 

1.0 

1.7 

8.0 

3.7 

PHOG 

- 

4.7 

1.0 

3.3 

6.7 

3.9 

POEM 

- 

5.0 

0.0 

3.7 

5.3 

3.5 

Gradientfaces 

- 

16.3 

3.0 

8.0 

22.3 

12.4 

IGOPCA 

- 

15.3 

3.0 

9.3 

15.7 

10.8 

SBGP 

- 

2.0 

1.7 

2.7 

6.7 

3.0 

SBGPM 

- 

0.7 

0.0 

1.0 

2.3 

0.9 


session two 

UP 1471 

23 

39.7 

- 

- 

- 

- 

DMMAI4S1 

12 

30.3 

- 

- 

- 

- 

ESRC-Gl4t)F~ 

- 

- 

- 

- 

- 

- 

LGOBP 

9 

33.7 

37.7 

49.0 

85.3 

51.1 

WLD 

7 

20.7 

20 

24.3 

27.3 

21.9 

LTP 

2 

18.7 

10.3 

16.0 

30.0 

17.5 

CS-LBP 

3 

18.3 

11.3 

16.3 

25.7 

16.8 

PHOG 

2 

22.3 

8.3 

17.0 

23.3 

16.5 

POEM 

2 

21.7 

8.7 

17.3 

24.3 

16.8 

Gradientfaces 

8 

36.7 

14.0 

28.3 

45.0 

29.2 

IGOPCA 

3 

28.7 

10.3 

22.7 

34.7 

22.5 

SBGP 

3 

17.7 

9.3 

13.3 

25.0 

15.3 

SBGPM 

2 

14.7 

3.3 

13.0 

10.3 

9.7 


^ Only 80 subjects for test and the other 20 for training. 


one and two, respectively, significantly surpassing the most 
closed performance at 3.5% (by POEM) in session one and 
16.5% (by PHOG) in session two. 

It can be seen from the table that the main gain of the 
local feature methods (e.g. POEM and LTP) over IGO 
methods lies in the tests of Expression(E), Scarf & Light¬ 
ing (SL), both of which cause large-scale local distortions 
and can lead to significant differences in performance of 
local feature and holistic methods. By integrating both 
approaches, the SBGPM has not only yielded excellent 
performances in single variations, but also achieved very 
low error rates even in the most severe cases affected by 
both large-scale scarf occlusion and lighting. 

EXP II: Multiple Training Samples Per Person 

We further compared the performance of the SBGP 
based methods with recent local fusion models and SRG 
based methods by using their latest published results on 
the AR database, under the four different implementations 
used by the related publications. The implementations are 
described as follows and the results are shown in Table [4j 

Implementation A followed the experimental set¬ 
ting of [25] by using two neutral faces from both sessions 
as gallery images and testing all four variations includ¬ 
ing expression (E), lighting (L), sunglass & lighting (GL) 
and scarf & lighting (SL), each having six faces per sub¬ 
ject. The SBGP methods were evaluated against two fu¬ 
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sion models integrating LBP and Gabor representations. 
The SBGPM achieved perfect performance in the group 
of lighting, which were not reported in [25] for the com¬ 
pared methods. The largest improvement lies in the group 
of sunglass & lighting, resulting in only 0.3% error for 
SBGPM compared to 46.1% for the best of the compared 
methods. 

Implementation B evaluated the performance on vari¬ 
ations of expression & lighting (E&L) and occlusions. For 
E&L, seven faces per subject were used for training, one 
neutral, three expressions and three lighting faces from 
session one, and tested the corresponding seven faces from 
session two. For occlusions, eight faces per subject in¬ 
cluding two neutral and six expression faces from both 
sessions, were used as gallery images, two faces of sun¬ 
glasses or scarves in both sessions were tested. The SRG- 
based approaches achieved low error rates for variations 
in expressions or lightings. However, their performances 
suffered seriously in large-scale occlusions such as scarves. 
A common remedy for mitigating this effect is to manu¬ 
ally partition a face image into a number of regions, and 
discard the occluded parts. The GRRG with partitions im¬ 
proved the performance substantially with very low error 
rates of 2.7%, 0.0% and 1.0% [48]. The SBGPM consis¬ 
tently exceeded these, and obtained almost perfect perfor¬ 
mance with 0.3%, 0.0% and 0.0% error rates. Note that 
the performance of SRG approaches strongly depends on 
the manual partition scheme, while the SBGP methods are 
completely automatic. 

Implementation C trained on seven non-occluded 
faces from session one (as in Implementation B) and tested 
on four sets of occluded faces. Each set contained three 
faces per subject, with sunglasses or scarves, including 
multiple effects by lighting, in session one or two. Four 
sets are indicated as GL[S1], GL[S2], SL[S1] and SL[S2] in 
the table. 

Implementation D conducted on three separate ex¬ 
periments according to [49] . The first one trained on seven 
non-occluded faces and one sunglass face (randomly se¬ 
lected from three in session one) and tested on seven non- 
occluded faces from session two and the remaining five 
sunglass faces in both sessions. The second experiment 
applied the similar training/test scheme for the faces with 
scarf occlusions. The last one evaluated both sunglass and 
scarf occlusions by using nine faces for training (seven non- 
occluded faces plus one (random) sunglass and one scarf 
faces from session one) and totally seventeen faces for test¬ 
ing, including seven non-occluded faces from session two 
and the remaining five sunglass and five scarf faces. The 
results of the SBGP methods were the average error rates 
of three, three and six cross-validated selections of training 
sets. 

Implementations G and D tested the SBGP methods 
on more complex variations such as sunglass & lighting, or 
scarf & lighting, which had been rarely evaluated by the 
SRG-based methods. Glearly, complex variations do not 
seem to hinder the extraordinary capabilities of the SBGP 


Table 4: Comparisons with recent local fusion models (Implementa¬ 
tion A[^) and SRC based methods (Implementation B[48]. C[48] 
and D|49p from published results. 


Method 

Error Rate 
(%) 

Implement. A 

E 

L 

GL 

SL 

L(^BP-M llVj 

13.9 


62.4 

17.4 

LGBP-P[23] 

14.1 


63.0 

16.5 

GVLBP-Ml^ 

9.4 


46.1 

12.6 

GVLBP-P[25] 

8.9 


53.9 

9.6 

SBGP 

2.5 

0.5 

1.0 

4.7 

SBGPM 

2.2 

0.0 

0.3 

1.2 

Implement. 


Sunglass 

Scarf 


SRC [501 

5.3 

13.0(2.5) 

40.5(6.5) 


LRC [S] 

23.3 

4.0 (- -) 

74.o4.5) 


CESR|52] 


30.0(- -) 

- - (1-7) 


CRC RLsm 

6.3 

31.5(8.5) 

9.5 (5.0) 


RSCgs, [53] 

10.0 




RCR [SS] 

4.1 

-- (1.5) 

- - (3.5) 


GRRCjlH] 

2.7 

7.0 (0.0) 

21.0(1.0) 


SBGP 

2.0 

0.5 

1.0 


SBGPM 

0.3 

0.0 

0.0 


Implement. C 

GL[S1] 

GL[S2] 

SL[S1] 

SL[S2] 

SRC 1501 

16.7 

51.3 

51.0 

71.0 

CRChlsISs] 

22.0 

47.7 

55.3 

70.7 

GRRCgg 

7.7 

48.3 

5.0 

15.7 

SBGP 

0.0 

6.0 

2.3 

11.0 

SBGPM 

0.3 

2.3 

0.3 

4.3 

Implement. D 

EGL 

ESL 

EGSL 


SRC [50| 

15.8 

23.7 

22.0 


LLC [56] 

15.5 

23.4 

21.0 


LR [19] 

14.6 

15.6 

18.4 


SBGP 

2.6 

3.7 

3.6 


SBGPM 

1.6 

1.2 

1.9 



^ The error rates presented in parentheses were achieved by using 
manually partition scheme. 

b SIFT[27], and its extension, Partial-Descriptor-SIFT, were test 
on this group with error rate of 6.1% and 4.5% in |57| . 

and SBGPM in these implementations, while the perfor¬ 
mance of SRG-based approaches suffered poorly. SBGPM 
again achieved extremely low average error rates, 1.8% 
and 1.6% for implementations G and D, a fraction of the 
average error rates of the best of the compared methods 
(19.2% by GRRG and 16.2% by LR). 

6.3. Aging and Unconstrained Variations 

The performance of the SBGPM descriptor was fur¬ 
ther investigated on age changes (Dupl and DupII of the 
FERET database) and unrestricted real-world images (the 
LEW database). On these databases, our experiments fol¬ 
lowed most of previous work by using square root of the 
features for representation and cosine distance for similar¬ 
ity measure I3911I1]. Note that good performance on the 
two challenging databases heavily relies on sophisticated 
machine learning models for learning high-level features 
and advanced classifiers. Development of advanced ma¬ 
chine learning methods is beyond the scope of this work. 
For an unbiased evaluation, our descriptors were fairly 
compared to a set of manually-designed features. In our 
implementation, we applied the Fisher’s Linear Discrim¬ 
inant Analysis (FDA) [3] for classification. Because the 
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Table 5: Performance on LWF for ageing and unconstrained varia¬ 
tions (CR-Correct Rate). 


FERET 

LFW 

Method 

CR (%) 

■ Method 

GR (%) 

Dupl 

DupII 

fs-siftIssI 

61.0 

53.0 

- 

- 

Gabor-WPCAlS^I 

78.8 

77.8 

Vl-likel66l 

64.2 

LBP-WPCAllil 

79.4 

70.0 

Vl-like+16'ni 

68.1 

WLGBP[ni 

74.0 

71.0 

Gabor (Cl)|39l 

68.4 

WH(;PPl2n 

79.5 

77.8 

LBP|89| 

67.9 

G-T.npimF 

78.8 

77.8 

FPT.BPIMI 

67.5 

LGBP-WPCAI62I 

83.8 

81.6 

TPLBP I39l 

69.0 

Zou’s Result |l8| 

85.0 

79.5 

Gomb.15^1^ 

74.5 

Tan’s Result |63|^ 

90.0 

85.0 

siftI5^I 

69.9 

POEM-WPCAlllI^^ 

88.8 

85.0 

poem|ii| 

73.7 

IGOPCA[9] 

88.9 

85.4 

- 

- 

SBGPM 

94.3 

89.7 

SBGPM 

78.7 


^ Fusion of multiple features: Gabor, LBP, FPLBP and TPLBP. 

^ The highest approximated results reported by curve in m, using 
Gabor pre-processing. 

^ A pre-precessing step was applied for getting higher performance, 
i.e. Gamma correction and DoG filter were used in m, and Retina 
filtering was processed before POEM m- 

FDA cannot be applied directly for face verification on 
the LFW, the performance on this dataset was evaluated 
without any learning processing. The correct rates of the 
SBGPM on two databases are compared to the recent pub¬ 
lished results in Table [H 

The results show that the SBGPM achieved competi¬ 
tive performance to recent descriptors with correct rates 
reaching 94.3% and 89.7% on Dupl and DupII, respec¬ 
tively. The margins between the SBGPM and the closest 
methods on the list are about 4% on both subsets, which 
are significant for this challenging dataset. Similarly, the 
proposed descriptor obtained 78.7% correct rate for face 
verification on the LFW database, further improving over 
the closest single descriptor (POEM) by 5% and the best 
multiple fusion descriptors by about 4% in correct rate. 
The favorable performance of the SBGPM on these chal¬ 
lenging variations further illustrates its highly discrimina¬ 
tive power and strong robustness for facial representation. 

7. Conclusion 

This paper has introduced a novel framework for robust 
facial representation. The proposed structural binary gra¬ 
dient pattern (SBGP) effectively enforces spatial locality 
in the gradient domain to enhance robustness against both 
illuminations and local distortions, yet still being compact 
and computationally efficient by encoding local structures 
to a set of binary patterns. Theoretical analysis shows 
that the defined structural patterns of the SBGP work 
extraordinarily as orientational micro edge detectors and 
thus gain strong spatial locality and orientation proper¬ 
ties, leading to effective discrimination. Furthermore, the 
SBGP is generic and suitable for building fusion models. 
As an example, the enhanced SBGPM has also been pre¬ 
sented as the resulting of combining SBGP and orienta¬ 


tional image gradients. Extensive justifications and exper¬ 
imental verifications demonstrate the efficiency of SBGP 
and SBGPM, and their markedly improved recognition 
performances over the existing methods on a variety of ro¬ 
bustness tests against lighting, expression, occlusion and 
aging. 
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